Student Solution

-->

"Education is the most powerful weapon which you can use to change the world”
– Nelson Mandela

1 University

1 Course

2 Subjects

Week 8 Project 8

Week 8 Project 8

Q R ? Programming Assignment Development Feedback Remember, the weekly assignments in this course are intentionally complex. All of the skills and knowledge needed to complete the assignments has been woven into the course modules, but I am not teaching directly to the assignments. Instead, I am asking you to pull together the pieces of the puzzle which will help you solve the problem. It is important you DO NOT PROCRASTINATE.?I am here to support you, but timely feedback requires you to work proactively.? At any time during week 8, you may submit screenshots, draft files,?etc. along with?specific questions?related to the development of this assignment. The goal is not for me to "pre-grade" your work, but offer guidance?and point you in the right direction. I encourage you to make use of this opportunity to refine and develop your work. Refer to my?Instructor Introduction and the?Course Syllabus?for the best ways to contact me, in order to get feedback, and expected response times.? ________________________________________ Format and Submission For the purposes of this assignment, use R Studio. Your submission should have 2 files. The first file should be the R Script and it should be saved like "Project8_LastName_FirstName.R" (notice the .R as the file extension of the script). The second file should be a Microsoft Word file that documents your solutions with screenshots and provides any written responses to the questions below (if the question doesn't specifically request the responses to be within the comments of the code). Both files are mandatory. You don't need individual screenshots for each question, but your screenshots should cover all questions. ________________________________________ Assignment Scenario and Tasks ***NOTE: For each question make sure that your work is not repeating something in the course modules or the textbook. Each response must be original and your own submission. You are strictly prohibited from having another person(s) write, review or edit your solution. Failure to follow this may result in a failing grade. • o ? You are a data scientist at the Universal Great Hospital (UGH). UGH is a nonprofit organization focusing on cancer treatment. You are asked to perform analysis on a digitized image of a fine needle aspirate (FNA) of a breast mass to determine whether the patient has malignant or benign. Your manager gives you a data set, breast_cancer_data.csv Download breast_cancer_data.csv, which contains the following columns: ? id: ID number ? Diagnosis: The diagnosis of breast tissues (M = malignant, B = benign) ? radius_mean: mean of distances from center to points on the perimeter ? texture_mean: standard deviation of gray-scale values ? perimeter_mean: mean size of the core tumor ? area_mean ? smoothness_mean: mean of local variation in radius lengths ? compactness_mean: mean of perimeter^2 / area - 1.0 ? concavity_mean: mean of severity of concave portions of the contour ? concave points_mean: mean for number of concave portions of the contour ? symmetry_mean ? fractal_dimension_mean: mean for "coastline approximation" - 1 ? radius_se: standard error for the mean of distances from center to points on the perimeter ? texture_se: standard error for standard deviation of gray-scale values ? perimeter_se ? area_se ? smoothness_se: standard error for local variation in radius lengths ? compactness_se: standard error for perimeter^2 / area - 1.0 ? concavity_se: standard error for severity of concave portions of the contour ? concave points_se: standard error for number of concave portions of the contour ? symmetry_se ? fractal_dimension_se: standard error for "coastline approximation" - 1 ? radius_worst: "worst" or largest mean value for mean of distances from center to points on the perimeter ? texture_worst: "worst" or largest mean value for standard deviation of gray-scale values ? perimeter_worst ? area_worst ? smoothness_worst: "worst" or largest mean value for local variation in radius lengths ? compactness_worst: "worst" or largest mean value for perimeter^2 / area - 1.0 ? concavity_worst: "worst" or largest mean value for severity of concave portions of the contour ? concave points_worst: "worst" or largest mean value for number of concave portions of the contour ? symmetry_worst ? fractal_dimension_worst: "worst" or largest mean value for "coastline approximation" - 1 Data Source: https://www.kaggle.com/uciml/breast-cancer-wisconsin-data (Links to an external site.) You are asked to perform the following tasks by writing a script in R and submit both R codes and a Word document. 1. 1. Load the dataset in breast_cancer_data.csv Download breast_cancer_data.csvinto R. Call the loaded data breast_cancer_data. Make sure that you have the directory set to the correct location for the data. 2. Define a user defined function BoxplotPredictorOnTarget with two arguments, the target and one predictor to plot the box plot of predictor based on different category of the target. Then use this user defined function to generate the box plot: a) area_mean against Diagnosis b) area_se against Diagnosis c) texture_mean against Diagnosis 1. 3. Build the following logistic models to forecast the Diagnosis and recommend the best model based on McFadden/pseudo R squared to the management. a) forecast Diagnosis using area_mean b) forecast the Diagnosis using area_mean and area_se c) forecast the Diagnosis using area_mean, area_se and texture_mean d) forecast the Diagnosis using area_mean, area_se, texture_mean and concavity_worst e) forecast the Diagnosis using area_mean, area_se, texture_mean, concavity_worst and concavity_mean View Rubric Projects Rubric (1) Projects Rubric (1) Criteria Ratings Pts R Codes and Style view longer description 15 pts Excellent R codes are easier to read, share, and verify. There are comments in the codes. There are no bugs in the script. 12 pts Good R codes ae easier to read, share, and verify. There are few comments. There are less than 3 bugs in the script. 9 pts Average R codes could be read. There is no comment. There are less than 4 bugs in the script. 6 pts Below Average R codes are hard to read. There is no comment. There are more than 5 bugs in the script. 3 pts Insufficient R codes are hard to read. There are no comments. There are more than 5 bugs in the script. / 15 pts Interpretation and Use of Model in the Word Document view longer description 10 pts Excellent The data and model is accurately interpreted to justify the answer, and sufficient data and model is used to defend the main argument. 8 pts Good The data and model is accurately interpreted to justify the answer, and model is used to defend the main argument, but it might not be sufficient. 6 pts Average Data and model is used to defend the main argument, but does not accurately interpret the idea and model, and it might not be sufficient. 4 pts Below Average Data and model is used to defend the main argument, but it is insufficient. 2 pts Insufficient Data and model is provided, but it is not used to defend the main argument. / 10 pts Total Points: 0

View Related Questions

Solution Preview

Define a user defined function BoxplotPredictorOnTarget with two arguments, the target and one predictor to plot the box plot of predictor based on different category of the target. Then use this user defined function to generate the box plot: a) area_mean against Diagnosis b) area_se against Diagnosis c) texture_mean against Diagnosie